In [1]:
%matplotlib inline
This note will show you how to use BigBang to investigate a particular project participant's activity.
We will focus on Fernando Perez's role within the IPython community.
First, imports.
In [2]:
from bigbang.archive import Archive
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Let's get all the available date from the IPython community. For now, this is just the mailing lists. One day, BigBang will also get its issue tracker data! That will be very exciting.
In [3]:
url = "ipython-user"
arx = Archive(url,archive_dir="../archives")
Now let's isolate the messages involving Fernando Perez.
This includes both messages from Fernando, and messages to Fernando.
In [4]:
fernandos = Archive(arx.data[arx.data.From.map(lambda x: 'Fernando' in x)])
fernandos.data[:3]
Out[4]:
Note that our way of finding Fernando Perez was not very precise. We've picked up another Fernando.
In [5]:
[x for x in fernandos.get_activity()]
Out[5]:
In future iterations, we will use a more sensitive entity recognition technique to find Fernando. This will have to do for now.
We will also need the data for all the emails that were not sent by Fernando.
In [6]:
not_fernandos = Archive(arx.data[arx.data.From.map(lambda x: 'Fernando' not in x)])
not_fernandos.data[:3]
Out[6]:
We now have two Archives made from the original Archive, with the same range of dates, but one with and the other without Fernando. Both contain emails from many addresses. We want to get a single metric of activity.
In [7]:
not_fernandos.get_activity().sum(1).values.shape
Out[7]:
In [8]:
nf = pd.DataFrame(not_fernandos.get_activity().sum(1))
In [9]:
f = pd.DataFrame(fernandos.get_activity().sum(1))
In [10]:
both = pd.merge(nf,f,how="outer",left_index=True,right_index=True,suffixes=("_nf","_f")).fillna(0)
Let's make a stackplot of this data to see how much of the conversation on the IPython developer's list has been Fernando, over time.
In [11]:
fig = plt.figure(figsize=(12.5, 7.5))
fa = fernandos.get_activity()
d = np.row_stack((both['0_f'],
both['0_nf']))
plt.stackplot(both.index.values,d,linewidth=0,label='foo')
fig.axes[0].xaxis_date()
plt.show()
The blue represents Fernando's contributions to the list. The green are the contributions of others.